Learning to Explore and Exploit in POMDPs

نویسندگان

Chenghui Cai

Xuejun Liao

Lawrence Carin

چکیده

A fundamental objective in reinforcement learning is the maintenance of a proper balance between exploration and exploitation. This problem becomes more challenging when the agent can only partially observe the states of its environment. In this paper we propose a dual-policy method for jointly learning the agent behavior and the balance between exploration exploitation, in partially observable environments. The method subsumes traditional exploration, in which the agent takes actions to gather information about the environment, and active learning, in which the agent queries an oracle for optimal actions (with an associated cost for employing the oracle). The form of the employed exploration is dictated by the specific problem. Theoretical guarantees are provided concerning the optimality of the balancing of exploration and exploitation. The effectiveness of the method is demonstrated by experimental results on benchmark problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Representing hierarchical POMDPs as DBNs for multi-scale map learning

We explore the advantages of representing hierarchical partially observable Markov decision processes (H-POMDPs) as dynamic Bayesian networks (DBNs). We use this model for representing and learning multi-resolution spatial maps for indoor robot navigation. Our results show that a DBN representation of H-POMDPs can train significantly faster than the original learning algorithm for H-POMDPs or t...

متن کامل

Reinforcement Learning in POMDPs Without Resets

We consider the most realistic reinforcement learning setting in which an agent starts in an unknown environment (the POMDP) and must follow one continuous and uninterrupted chain of experience with no access to “resets” or “offline” simulation. We provide algorithms for general POMDPs that obtain near optimal average reward. One algorithm we present has a convergence rate which depends exponen...

متن کامل

Reinforcement Learning in POMDPs Without Resets

We consider the most realistic reinforcement learning setting in which an agent starts in an unknown environment (the POMDP) and must follow one continuous and uninterrupted chain of experience with no access to “resets” or “offline” simulation. We provide algorithms for general connected POMDPs that obtain near optimal average reward. One algorithm we present has a convergence rate which depen...

متن کامل

Learning in POMDPs with Monte Carlo Tree Search

The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation an...

متن کامل

Factorized Asymptotic Bayesian Policy Search for POMDPs

This paper proposes a novel direct policy search (DPS) method with model selection for partially observed Markov decision processes (POMDPs). DPSs have been standard for learning POMDPs due to their computational efficiency and natural ability to maximize total rewards. An important open challenge for the best use of DPS methods is model selection, i.e., determination of the proper dimensionali...

متن کامل

A (Revised) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes

Partially observable Markov decision processes (POMDPs) are interesting because they provide a general framework for learning in the presence of multiple forms of uncertainty. We survey methods for learning within the POMDP framework. Because exact methods are intractable we concentrate on approximate methods. We explore two versions of the POMDP training problem: learning when a model of the P...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Learning to Explore and Exploit in POMDPs

نویسندگان

چکیده

منابع مشابه

Representing hierarchical POMDPs as DBNs for multi-scale map learning

Reinforcement Learning in POMDPs Without Resets

Reinforcement Learning in POMDPs Without Resets

Learning in POMDPs with Monte Carlo Tree Search

Factorized Asymptotic Bayesian Policy Search for POMDPs

A (Revised) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes

عنوان ژورنال:

اشتراک گذاری